AITopics | final verdict

Collaborating Authors

final verdict

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ALARB: An Arabic Legal Argument Reasoning Benchmark

Shairah, Harethah Abu, AlHarbi, Somayah, AlHussein, Abdulaziz, Alsabea, Sameer, Shaqaqi, Omar, AlShamlan, Hebah, Knio, Omar, Turkiyyah, George

arXiv.org Artificial IntelligenceOct-2-2025

We introduce ALARB, a dataset and suite of tasks designed to evaluate the reasoning capabilities of large language models (LLMs) within the Arabic legal domain. While existing Arabic benchmarks cover some knowledge-intensive tasks such as retrieval and understanding, substantial datasets focusing specifically on multistep reasoning for Arabic LLMs, especially in open-ended contexts, are lacking. The dataset comprises over 13K commercial court cases from Saudi Arabia, with each case including the facts presented, the reasoning of the court, the verdict, as well as the cited clauses extracted from the regulatory documents. We define a set of challenging tasks leveraging this dataset and reflecting the complexity of real-world legal reasoning, including verdict prediction, completion of reasoning chains in multistep legal arguments, and identification of relevant regulations based on case facts. We benchmark a representative selection of current open and closed Arabic LLMs on these tasks and demonstrate the dataset's utility for instruction tuning. Notably, we show that instruction-tuning a modest 12B parameter model using ALARB significantly enhances its performance in verdict prediction and Arabic verdict generation, reaching a level comparable to that of GPT-4o.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.00694

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > Saudi Arabia (0.49)

Genre: Research Report (0.50)

Industry: Law > Litigation (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

ProdRev: A DNN framework for empowering customers using generative pre-trained transformers

Gupta, Aakash, Das, Nataraj

arXiv.org Artificial IntelligenceMay-21-2025

Following the pandemic, customers, preference for using e-commerce has accelerated. Since much information is available in multiple reviews (sometimes running in thousands) for a single product, it can create decision paralysis for the buyer. This scenario disempowers the consumer, who cannot be expected to go over so many reviews since its time consuming and can confuse them. Various commercial tools are available, that use a scoring mechanism to arrive at an adjusted score. It can alert the user to potential review manipulations. This paper proposes a framework that fine-tunes a generative pre-trained transformer to understand these reviews better. Furthermore, using "common-sense" to make better decisions. These models have more than 13 billion parameters. To fine-tune the model for our requirement, we use the curie engine from generative pre-trained transformer (GPT3). By using generative models, we are introducing abstractive summarization. Instead of using a simple extractive method of summarizing the reviews. This brings out the true relationship between the reviews and not simply copy-paste. This introduces an element of "common sense" for the user and helps them to quickly make the right decisions. The user is provided the pros and cons of the processed reviews. Thus the user/customer can take their own decisions.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/DASA54658.2022.9765232

2505.13491

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > India > Tripura > Agartala (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs

Song, Mingyang, Zheng, Mao, Luo, Xuan

arXiv.org Artificial IntelligenceMar-8-2025

Using Large Language Models (LLMs) to evaluate and compare two answers from different models typically involves having LLM-based judges select the better answer. However, humans often approach problem-solving from a reverse perspective, for instance, by choosing the worse option instead of the better one in a pairwise comparison. Generally, this kind of reverse thinking plays a crucial role in human reasoning and decision-making and can further test the difference between original and reverse thought processes simultaneously. To address the above issue, in this paper, we propose a Goal-Reversed Prompting (GRP) approach for pairwise evaluation that shifts the original task from selecting the better answer to choosing the worse one. We encourage LLMs to think in reverse by prompting LLMs to identify the worse response. Experiments on closed-source models demonstrate that GRP significantly enhances evaluation capabilities, outperforming the prompt template with the original goal.

evaluation, language model, template, (15 more...)

arXiv.org Artificial Intelligence

2503.06139

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Spain (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Saha, Swarnadeep, Li, Xian, Ghazvininejad, Marjan, Weston, Jason, Wang, Tianlu

arXiv.org Artificial IntelligenceJan-29-2025

LLM-as-a-Judge models generate chain-of-thought (CoT) sequences intended to capture the step-bystep reasoning process that underlies the final evaluation of a response. However, due to the lack of human annotated CoTs for evaluation, the required components and structure of effective reasoning traces remain understudied. Consequently, previous approaches often (1) constrain reasoning traces to hand-designed components, such as a list of criteria, reference answers, or verification questions and (2) structure them such that planning is intertwined with the reasoning for evaluation. In this work, we propose EvalPlanner, a preference optimization algorithm for Thinking-LLM-as-a-Judge that first generates an unconstrained evaluation plan, followed by its execution, and then the final judgment. In a self-training loop, EvalPlanner iteratively optimizes over synthetically constructed evaluation plans and executions, leading to better final verdicts. Our method achieves a new state-of-the-art performance for generative reward models on RewardBench (with a score of 93.9), despite being trained on fewer amount of, and synthetically generated, preference pairs. Additional experiments on other benchmarks like RM-Bench, JudgeBench, and FollowBenchEval further highlight the utility of both planning and reasoning for building robust LLM-as-a-Judge reasoning models.

evalplanner, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.18099

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

'Roswell: The Final Verdict' Review: Aliens vs. Artificial Intelligence

#artificialintelligenceJul-3-2021, 14:20:04 GMT

The recent emergence of U.S. Navy videos of UFOs--and the fact that the government is addressing them seriously--will no doubt generate larger than average buzz around "Roswell: The Final Verdict," although the title suggests something like "Final Destination 6": Will the question of intergalactic life ever really be resolved until extraterrestrials can walk comfortably among us? "Final Verdict" is hooked to the 74th anniversary of the incidents at Roswell. It's safe to expect similar celebrations next year. Meanwhile, this Discovery production is an ambitious if somewhat overheated summing-up of what happened near the New Mexico city in 1947, the stuff of both scientific speculation and folklore: Did the government cover up the crash landing of an alien spaceship, replete with otherworldly visitors? Or did the "witnesses" who claimed that it all happened construct an elaborate hoax?

artificial intelligence, final verdict, roswell, (1 more...)

#artificialintelligence

Country:

North America > United States > New Mexico (0.27)
North America > Mexico > Mexico City > Mexico City (0.27)

Industry:

Government > Military (0.75)
Government > Regional Government > North America Government > United States Government (0.39)

Technology: Information Technology > Artificial Intelligence (0.71)

Add feedback